<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements.  See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<HTML>
<BODY>

<H1>The GemFire Distributed Lock Service</H1>

The GemFire Distributed Lock Service (also know as "dlock" or "DLS")
provides synchronization locks that span multiple members of a
distributed system.  The user creates a {@link
org.apache.geode.distributed.DistributedLockService} with a given
name and uses to acquire distributed locks.  Lock coordination and
management is provided by an instance of the distributed lock service
referred to as the "grantor".  Each lock service instance knows which
member is the grantor.  A member of the distributed system referred to
as the "elder" maintains a mapping between the name of a lock service
and the member who is the grantor.  The distributed lock service is
used to implement GemFire global Regions and distributed transactions.

<H2>Creating a distributed lock service</H2>

Before a member of the distributed system can create an instance of a
distributed lock service, it must contact the "elder" member to find
out which member hosts the lock service's "grantor".  The "elder"
member is defined as the oldest member in the JavaGroups view of the
distributed system.  The member that is creating the lock service
sends the elder a {@link
org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.GrantorRequestMessage} to
find out who the current grantor is.  If no member is currently the
grantor of the distributed lock service, then the member that sent the
request is chosen to be the grantor.

<P>

<CENTER>
<IMG SRC="{@docRoot}/javadoc-images/elder.jpg" WIDTH="435" HEIGHT="287">
</CENTER>

<P>

By centralizing the information and logic regarding which member hosts
a grantor, we avoid numerous race conditions and deadlocks when a new
lock service is created, when a lock service is destroyed, when a
member that hosts a grantor crashes, and when a lock service wishes to
<a href="#transfer-grantor">transfer</a> grantorship to itself.  Race
conditions are also reduced by minimizing the number of messages that
are sent to the elder.  For instance, we do not broadcast a message
when a member becomes grantor.  Instead, we let the other members of
the distributed system ask the elder for this information when they
need it.

<H2>Requesting a distributed lock</H2>

Once a distributed lock service has been created and initialized, it
can be used to obtain a distributed lock.  The member that wishes to
obtain a lock sends a {@link
org.apache.geode.distributed.internal.locks.DLockRequestProcessor.DLockRequestMessage} to the
lock service's grantor.  The grantor will process the lock request and
will grant the lock once it becomes available.

<P>

However, it is possible that the member that the requester sent the
{@link org.apache.geode.distributed.internal.locks.DLockRequestProcessor.DLockRequestMessage}
to is no longer the grantor.  That is, there can be a race between the
requester requesting the lock and the grantor lock service being
destroyed.  In this scenario, the member replies to the requester with
a <code>LockResponseMessage</code> that
indicates that the member is not the grantor.  The requester will then
send a {@link
org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.GrantorRequestMessage} to
the elder to determine who the new grantor is.

<P>

Another interesting race condition may occur when a requester learns
the identity of the grantor before the grantor member learns that it
is the grantor.  That is, the elder might inform a requester of the
identity of a grantor before the grantor member has fully initialized
(or has even heard back from the elder that it is the grantor).  This
race condition is resolved by blocking the processing of a {@link
org.apache.geode.distributed.internal.locks.DLockRequestProcessor.DLockRequestMessage} until
the member knows whether or not it is the grantor.

<H2>Lock Service Destruction</H2>

When an application has finished using a lock service, the service can
be {@linkplain
org.apache.geode.distributed.DistributedLockService#destroy
destroyed}.  When a non-grantor distributed lock service is destroyed,
the member destroying the lock service sends a
<code>DestroyServiceMessage</code> to the grantor.  Upon receiving the
message, the grantor cleans up all pending lock requests made by the
service being destroyed.  The grantor also responds to the pending
requests with a "you're destroyed" message indicating that the locks
were not granted because the member's service was destroyed.  The
thread destroying the non-grantor lock service blocks until it
receives an acknowledgment message from the grantor saying that the
grantor has cleaned up all pending requests.  This blocking prevents
race conditions if a member were to successively destroy and recreate
the same named lock service.

[What do we do about lock request messages that have been received by
the grantor, but have not been processed?  That is, the lock request
messages that are still in the message queue when the service destroy
message is received.]

<H3>Grantor Destruction</H3>

When a <i>grantor</i> lock service is destroyed, all lock-related
requests (including requests for lock releases and lock service
destructions) block until the grantor is completely destroyed.  Once
the grantor is destroyed, lock-related requests are replied to with a
"not grantor" message.

In a simple destruction algorithm, the grantor could alert the elder
that it is no longer grantor.  The next member to request information
about the grantor from the elder would be named the grantor.  This new
grantor would perform "grantor recovery" (see below) to determine
which members held which locks.  However, this algorithm is far from
optimal because it requires that each member of the distributed system
(remember, we do not keep track of which members have a lock service)
be contacted after the grantor has been destroyed.

In a more orderly destruction algorithm, the grantor being destroyed
selects a member to be its "successor" and instructs the successor to
request that it become grantor (see <a
href="#transfer-grantor">"Transferring Grantorship"</a> below).  The
grantor being destroyed chooses a successor based on number of locks
held by each member.  The successor is the member who holds the most
locks.  If two or more members hold the same number of locks, any one
of those members can be chosen as a successor.  The grantor being
destroyed sends a message to the successor indicating that the
successor should request that it become the new grantor.  The
grantor's destruction will not proceed until it receives a <code>TransferGrantorshipMessage</code>
indicating that another member is becoming the grantor (Note that this
message may not always come from the successor.  It may originate from
one of the "young turks" described below.), or until it receives a
message saying that the successor has refused to become grantor (This
may occur if the successor no longer has an instance of the lock
service.), or if the successor member leaves the distributed system.
The destroy operation will not return, and lock-related requests will
block, until the transfer of grantorship has completed.  The
successor-based algorithm minimizes the number of processes involved
in grantor destruction and the number of messages that must be sent
between processes.

If no locks are held when the grantor is destroyed, the grantor sends
the elder a "destroyed with no locks" message that instructs the elder
that there is no grantor.  The grantor does not bother to name a
successor because there is no good choice.  Since the elder knows that
no locks were held at the time of destruction, it can instruct the new
grantor (that is, the next member that requests information about the
grantor) that grantor recovery is not necessary.  After the grantor
being destroyed sends message, destruction is complete.

<H3>Grantor Recovery</H3>

When a new grantor is chosen, the elder informs the new grantor that
it is not the first grantor and that the new grantor must recover
lock-related information that was previously maintained by the old
grantor.  Before the newly-chosen grantor can service lock requests,
it has to recover information from other members of the distributed
lock service.  The new grantor sends <code>LockRecoveryMessage</code>s to all
members of the distributed system that respond with information about
which distributed locks, if any, they hold.

<P>

It has been noted that that performing grantor recovery after the
current grantor is cleanly destroyed is not optimal.  As an
optimization, the grantor being destroyed could choose a successor
grantor and transfer grantorship to it.  Since the grantor does not
maintain a list of which members have created the lock service, it
could choose a successor based on the members that have obtained
locks.

<A NAME="transfer-grantor"><H2>Transferring Grantorship</H2></A>

In some applications a single member of the distributed lock service
may want to acquire many locks in rapid succession.  We allow the user
to optimize this scenario by ensuring that the member that requests
the locks is also the grantor.  The grantor member can grant itself
locks faster that it can grant locks to other members because there is
no messaging involved.  The {@link
org.apache.geode.distributed.DistributedLockService#becomeLockGrantor}
transfers grantorship to a distributed lock service instance.

<P>

A lock service (known in this example as the "young turk") initiates
grantorship transfer by sending a <code>TransferGrantorshipRequest</code>
to the elder.  The elder notes that the young turk is now the grantor
and instructs the current grantor that it is no longer the grantor and
that it should expect to hear from the young turk shortly.  Once the
current grantor has received this message, it will reply to {@link
org.apache.geode.distributed.internal.locks.DLockRequestProcessor.DLockRequestMessage}s with a
response that indicates that it is not the grantor.  Note that the
young turk will not reply to {@link
org.apache.geode.distributed.internal.locks.DLockRequestProcessor.DLockRequestMessage}s until
grantorship has been fully transferred.

<P>

After it has notified the current grantor, the elder replies to the
young turk with the identity of the current grantor.  The young turk,
in turn, requests that the current grantor transfer locking-related
information (for example, which members hold which lock) so that the
young turk knows how to grant new locks.  This direct transfer of
grantorship is an optimization over regular grantor recovery.

<P>

An application (or more likely a Hydra test) that has many distributed
lock service members that simultaneously transfer grantorship may
result in a "chain" of grantors.  That is, the first young turk is
transferring grantorship from the current grantor, but there is also a
line of other young turks that want to become grantor.  In fact, only
the last young turk in line will actually become grantor.  The last
young turk in line is the member that the elder believes is the
grantor.  To ensure that one of the young turks in the middle of the
line does not become grantor, when a young turk that is in the process
of having grantorship transferred to it receives a request for grantor
transfer from another young turk, the older turk gives up its dreams
of becoming grantor and "passes through" the lock information to the
younger turk when that information arrives.  After the older turk
gives up, it replies to any pending {@link
org.apache.geode.distributed.internal.locks.DLockRequestProcessor.DLockRequestMessage}s with
"not grantor".

<P>

There are several interesting scenarios to consider when members crash
during the transfer of grantorship.  In the first scenario, the
current grantor crashes during grantorship transfer.  When the young
turk detects that the current grantor has crashed, it simply performs
regular grantor recovery.  If there are not any younger turks, then
the young turk becomes grantor.

<P>

In the second scenario, a young turk detects that its younger turk has
crashed, thus breaking the chain of young turks.  In this scenario,
the young turk knows that it will never be grantor (because it had a
younger turk) and gives up trying to be grantor because it doesn't
know if there are any younger turks to succeed it.  The younger turk
on the other half of the broken chain will detect that its older turk
has crashed and will initiate grantor recovery.

<P>

<CENTER>
<IMG SRC="{@docRoot}/javadoc-images/turks.jpg" WIDTH="652" HEIGHT="315">
</CENTER>

<H2>Elder Death</H2>

If an elder crashes or disconnects from the distributed system, a new
elder is chosen.  When a member of the distributed system needs to
contact the elder, it consults its JavaGroups view to determine which
member is the eldest.  (Note that we assume that all members of the
distributed system have identical JavaGroups views and will, thus,
choose the same member as the elder.)  The member then sends a
grantor-related message (such as a {@link
org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.GrantorRequestMessage}) to
the member it believes to the elder.  If a member receives a
grantor-related message, it assumes it is the elder and begins to act
as such.

<P>

When a member realizes that it is the Elder, it initiates elder
recovery by sending an <code>ElderInfoRequest</code> to each
member of the distributed system.  The other members reply with the
names of the lock service grantors that they host.  This information
allows the elder to build its mapping of lock services to grantors.
While the elder is initializing, it will not respond to any requests
for grantor information.

<P>

To optimize the common case in which the elder does not leave the
distributed system, we do not require the first member to join a
distributed system to perform elder recovery.  When a member connects
to the distributed system, it keeps track of whether or not there are
any other members.  If there are no other members, this first member
notes this fact and if it becomes the elder, it will not perform elder
recovery because it knows that there was not a previous elder.

<P>

A member lazily determines that it is the elder when it receives a
request for grantor information.  A member could be more eager about
becoming the elder, but if the application is not using the
distributed lock service, the potential elder recovery would be wasted
effort.

</BODY>
</HTML>